CFE - A System for Testing, Evaluation and Machine Learning of UIMA Based Applications
نویسندگان
چکیده
There is a vast quantity of information available in unstructured form, and the academic and scientific communities are increasingly looking into new techniques for extracting key elements finding the structure in the unstructured. There are various ways to identify and extract this type of data; one leading system, which we will focus on, is the UIMA framework. Tasks that are often desirable to perform with such data after it has been identified are testing, correctness verification (evaluation) and model building for machine learning systems. In this paper, we describe a new Open Source tool, CFE, which has been designed to assist in both model building and evaluation projects. In our environment, we used CFE extensively for both building intricate machine learning models, running parameter-tuning experiments on UIMA components, and for evaluating a hand-annotated "gold standard" corpus against annotations automatically generated by a complex UIMA-based system. CFE provides a flexible, yet powerful language for working with the UIMA CAS the results of UIMA processing to enable the collection and classification of resultant data. We describe the syntax and semantics of the language, as well as some prototypical, real-world use cases for CFE.
منابع مشابه
ClearTK 2.0: Design Patterns for Machine Learning in UIMA
ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has fol...
متن کاملFault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملExploration of the Customized Fixtures for the Evaluation of Three-point Bending Strength of Dental Resin Composites
Introduction: This study aimed to devise customized fixtures for the evaluation of three-point bending strength (TPBS) of resin-based dental composites (RBCs). Materials and Methods: A cube-shaped jig made out of wood with dimensions of 105×105×101 mm was prepared in this study. A 20-mm-diameter hole was made in the center of the wooden jig. In addition, a stai...
متن کاملارائه الگوریتمی مبتنی بر یادگیری جمعی به منظور یادگیری رتبهبندی در بازیابی اطلاعات
Learning to rank refers to machine learning techniques for training a model in a ranking task. Learning to rank has been shown to be useful in many applications of information retrieval, natural language processing, and data mining. Learning to rank can be described by two systems: a learning system and a ranking system. The learning system takes training data as input and constructs a ranking ...
متن کاملCombination of Rule-based and Machine Learning for Biomedical Event Extraction
This paper describes the method for biomedical event extraction. The biomedical events occurs in relative to biomedical concepts (objects) as proteins, genes. In this work, we try a hybrid method to identify given event types relative to a given set of proteins in biomedical text. The approach combines rule-based and machine learning. A Set of rules is built based on event triggers, and a set o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008